1,140 research outputs found
A simple parameter-free and adaptive approach to optimization under a minimal local smoothness assumption
We study the problem of optimizing a function under a \emph{budgeted number
of evaluations}. We only assume that the function is \emph{locally} smooth
around one of its global optima. The difficulty of optimization is measured in
terms of 1) the amount of \emph{noise} of the function evaluation and 2)
the local smoothness, , of the function. A smaller results in smaller
optimization error. We come with a new, simple, and parameter-free approach.
First, for all values of and , this approach recovers at least the
state-of-the-art regret guarantees. Second, our approach additionally obtains
these results while being \textit{agnostic} to the values of both and .
This leads to the first algorithm that naturally adapts to an \textit{unknown}
range of noise and leads to significant improvements in a moderate and
low-noise regime. Third, our approach also obtains a remarkable improvement
over the state-of-the-art SOO algorithm when the noise is very low which
includes the case of optimization under deterministic feedback (). There,
under our minimal local smoothness assumption, this improvement is of
exponential magnitude and holds for a class of functions that covers the vast
majority of functions that practitioners optimize (). We show that our
algorithmic improvement is borne out in experiments as we empirically show
faster convergence on common benchmarks
Local Rademacher complexities
We propose new bounds on the error of learning algorithms in terms of a
data-dependent notion of complexity. The estimates we establish give optimal
rates and are based on a local and empirical version of Rademacher averages, in
the sense that the Rademacher averages are computed from the data, on a subset
of functions with small empirical error. We present some applications to
classification and prediction with convex function classes, and with kernel
classes in particular.Comment: Published at http://dx.doi.org/10.1214/009053605000000282 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Bounding Embeddings of VC Classes into Maximum Classes
One of the earliest conjectures in computational learning theory-the Sample
Compression conjecture-asserts that concept classes (equivalently set systems)
admit compression schemes of size linear in their VC dimension. To-date this
statement is known to be true for maximum classes---those that possess maximum
cardinality for their VC dimension. The most promising approach to positively
resolving the conjecture is by embedding general VC classes into maximum
classes without super-linear increase to their VC dimensions, as such
embeddings would extend the known compression schemes to all VC classes. We
show that maximum classes can be characterised by a local-connectivity property
of the graph obtained by viewing the class as a cubical complex. This geometric
characterisation of maximum VC classes is applied to prove a negative embedding
result which demonstrates VC-d classes that cannot be embedded in any maximum
class of VC dimension lower than 2d. On the other hand, we show that every VC-d
class C embeds in a VC-(d+D) maximum class where D is the deficiency of C,
i.e., the difference between the cardinalities of a maximum VC-d class and of
C. For VC-2 classes in binary n-cubes for 4 <= n <= 6, we give best possible
results on embedding into maximum classes. For some special classes of Boolean
functions, relationships with maximum classes are investigated. Finally we give
a general recursive procedure for embedding VC-d classes into VC-(d+k) maximum
classes for smallest k.Comment: 22 pages, 2 figure
Linear Programming for Large-Scale Markov Decision Problems
We consider the problem of controlling a Markov decision process (MDP) with a
large state space, so as to minimize average cost. Since it is intractable to
compete with the optimal policy for large scale problems, we pursue the more
modest goal of competing with a low-dimensional family of policies. We use the
dual linear programming formulation of the MDP average cost problem, in which
the variable is a stationary distribution over state-action pairs, and we
consider a neighborhood of a low-dimensional subset of the set of stationary
distributions (defined in terms of state-action features) as the comparison
class. We propose two techniques, one based on stochastic convex optimization,
and one based on constraint sampling. In both cases, we give bounds that show
that the performance of our algorithms approaches the best achievable by any
policy in the comparison class. Most importantly, these results depend on the
size of the comparison class, but not on the size of the state space.
Preliminary experiments show the effectiveness of the proposed algorithms in a
queuing application.Comment: 27 pages, 3 figure
- …